Introduction

This is a simple helper notebook to quickly get some numbers about your graphs out of a Neo4j database. You just need to start your Neo4j database locally with the default values before running this notebook.

Setup

First, we fire up the connection to Neo4j that contains all the data. If needed, you could add some custom parameters like URL or port to adjust the setup to your settings.


In [9]:
import py2neo
import pandas as pd
graph = py2neo.Graph()
graph.dbms.kernel_version


Out[9]:
(3, 1, 3)

Let's get some numbers!

Nodes

Number of all Nodes


In [10]:
graph.data("MATCH (n) RETURN COUNT(n) AS NumberOfAllNodes")


Out[10]:
[{'NumberOfAllNodes': 23829}]

Nodes and their Labels


In [16]:
pd.DataFrame(graph.data("MATCH (n) RETURN labels(n) AS Labels, COUNT(n) AS LabelCount ORDER BY LabelCount DESC"))


Out[16]:
LabelCount Labels
0 14034 [Git, Change]
1 2847 [File, Git]
2 2143 [Xml, Attribute]
3 1335 [Xml, Element]
4 665 [File, Artifact, Maven, Container]
5 639 [Git, Commit]
6 327 [Java, Member, Method]
7 231 [Value, Java, Annotation]
8 215 [Xml, Text]
9 177 [Value, Property]
10 162 [Java, Parameter]
11 159 [File, Type, Java]
12 110 [Value, Array]
13 99 [Value, Primitive]
14 87 [Java, Field, Member]
15 76 [Java, Member, Constructor, Method]
16 62 [JUnit, TestCase]
17 57 [File]
18 45 [Author, Git]
19 40 [Java, Member, Method, Test, Junit4]
20 29 [Maven, Plugin]
21 29 [Concept]
22 28 [Maven, Configuration]
23 26 [File, Type, Java, Class]
24 23 [Maven, ExecutionGoal]
25 23 [File, Package, Container, Directory, Java]
26 19 [Maven, PluginExecution]
27 19 [Value, Enum]
28 15 [File, Container, Directory]
29 10 [Value, Class]
30 10 [Git, Branch]
31 10 [File, Xml, Document, JUnit, TestSuite]
32 8 [File, Type, Repository, Java, Class, Spring, ...
33 7 [Subdomain]
34 7 [Java, Member, Method, AssertJ, Assert]
35 6 [File, Type, Java, Class, Jpa, Entity]
36 5 [File, Type, Java, Interface]
37 5 [File, Type, Java, Class, Spring, Component, C...
38 5 [File, Java, Properties]
39 4 [Maven, Profile]
40 4 [File, Type, Repository, Java, Interface, Spri...
41 4 [Java, Member, Method, Spring, ManagedAttribute]
42 4 [File, Package, Container, Directory, Java, La...
43 3 [Xml, Namespace]
44 1 [File, Repository, Git]
45 1 [Git, Branch, Current]
46 1 [Pom, Maven]
47 1 [Repository, Maven]
48 1 [File, Type, Java, Class, Spring, ManagedResou...
49 1 [File, Package, Container, Directory, Java, Root]
50 1 [Java, Member, Method, Spring, Assert]
51 1 [File, Type, Java, Class, Spring, Component, S...
52 1 [File, Project, Maven, Container, Directory]
53 1 [File, Artifact, Maven, Container, Directory, ...
54 1 [File, Pom, Maven, Xml, Document]
55 1 [Java, Member, Method, Spring, ManagedOperation]
56 1 [File, Xml, Document]
57 1 [File, Container, Directory, Dependent]
58 1 [File, Artifact, Maven, Container, Directory, ...
59 1 [File, Container, Directory, TestReport, JUnit]

Relationships

Number of all Relationships


In [12]:
graph.data("MATCH ()-[r]-() RETURN COUNT(r) AS NumberOfAllRelationships")


Out[12]:
[{'NumberOfAllRelationships': 89926}]

Relationships and their Types


In [13]:
pd.DataFrame(graph.data("MATCH ()-[r]-() RETURN type(r) AS Type, COUNT(r) AS TypeCount ORDER BY TypeCount DESC"))


Out[13]:
Type TypeCount
0 CONTAINS_CHANGE 28068
1 MODIFIES 28068
2 HAS_FILE 5694
3 HAS_ATTRIBUTE 4286
4 HAS_ELEMENT 2646
5 INVOKES 2200
6 HAS_SIBLING 2192
7 DEPENDS_ON 1728
8 HAS_PARENT 1466
9 COMMITTED 1278
10 HAS_COMMIT 1278
11 CONTAINS 1218
12 MANAGES_DEPENDENCY 1194
13 OF_TYPE 1146
14 DECLARES 1098
15 HAS_LAST_CHILD 884
16 HAS_FIRST_CHILD 884
17 OF_NAMESPACE 624
18 HAS 622
19 RETURNS 494
20 ANNOTATED_BY 458
21 HAS_TEXT 430
22 READS 362
23 REQUIRES 318
24 THROWS 156
25 DECLARES_DEPENDENCY 148
26 WRITES 122
27 EXTENDS 112
28 BELONGS_TO 100
29 HAS_AUTHOR 90
30 HAS_PROPERTY 60
31 IS 58
32 IS_ARTIFACT 58
33 HAS_CONFIGURATION 56
34 USES_PLUGIN 50
35 HAS_GOAL 46
36 IMPLEMENTS 42
37 HAS_EXECUTION 38
38 USES 38
39 HAS_ROOT_ELEMENT 24
40 HAS_HEAD 24
41 HAS_BRANCH 22
42 DEFINES_DEPENDENCY 10
43 HAS_PROFILE 8
44 MANAGES_PLUGIN 8
45 DECLARES_NAMESPACE 6
46 CREATES 4
47 DESCRIBES 4
48 HAS_REPOSITORY 2
49 HAS_MODEL 2
50 HAS_EFFECTIVE_MODEL 2

Properties

Number of all properties


In [14]:
graph.data("MATCH (n) RETURN SUM(SIZE(KEYS(n))) as NumberOfAllProperties")


Out[14]:
[{'NumberOfAllProperties': 44680}]

Amount of specific Properties


In [15]:
pd.DataFrame(graph.data("""
MATCH (n) WITH KEYS(n) as keys 
UNWIND keys as properties 
RETURN properties as Property, COUNT(properties) as PropertyCount
ORDER BY PropertyCount DESC"""))


Out[15]:
Property PropertyCount
0 modificationKind 14034
1 name 5288
2 relativePath 2847
3 createdAtEpoch 2701
4 createdAt 2701
5 value 2628
6 deletedAtEpoch 1859
7 deletedAt 1859
8 fqn 913
9 time 711
10 type 667
11 group 667
12 version 645
13 date 639
14 epoch 639
15 sha 639
16 message 639
17 author 639
18 signature 543
19 visibility 385
20 lastModificationAtEpoch 342
21 lastModificationAt 342
22 fileName 337
23 cyclomaticComplexity 253
24 effectiveLineCount 232
25 lastLineNumber 232
26 firstLineNumber 232
27 index 162
28 transient 76
29 volatile 76
... ... ...
37 inherited 48
38 identString 45
39 email 45
40 status 29
41 abstract 23
42 final 17
43 phase 13
44 standalone 12
45 characterEncodingScheme 12
46 xmlWellFormed 12
47 xmlVersion 12
48 synthetic 11
49 failures 10
50 tests 10
51 static 10
52 skipped 10
53 errors 10
54 groupId 3
55 uri 3
56 artifactId 3
57 packaging 3
58 prefix 2
59 releasesChecksumPolicy 1
60 url 1
61 releasesUpdatePolicy 1
62 snapshotsUpdatePolicy 1
63 layout 1
64 snapshotsChecksumPolicy 1
65 releasesEnabled 1
66 snapshotsEnabled 1

67 rows × 2 columns